-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test: Hooks Glue Exporter #6721
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving as this is a significant improvement. But note that the condition for running the test is incorrect: we should look at whether we are testing AWS, not whether we are testing S3. Ideally we could fix this now, rather than carry tech debt for whenever we want to test on MinIO somehow.
esti/catalog_export_test.go
Outdated
TableSpec *hiveTableSpec | ||
} | ||
|
||
const glueExportScript = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document this please? Docs should say it's Lua (I think...), but also explain what {{...}}
in it means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative, Go has a statically-linked readonly filesystem available. If you used that you could put this in a file, statically link that file into the filesystem, and then the test could read from this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all the templates / strings everything to files with go template!
esti/catalog_export_test.go
Outdated
exporter.export_glue(glue, args.catalog.db_name, args.table_source, args.catalog.table_input, action, {debug=true}) | ||
` | ||
|
||
const glueExporterAction = ` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too. AFAICT this is YAML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed all the templates / strings everything to files with go template!
esti/catalog_export_test.go
Outdated
|
||
func setupGlueClient(ctx context.Context, accessKeyID, secretAccessKey string) (*glue.Client, error) { | ||
cfg, err := config.LoadDefaultConfig(ctx, | ||
config.WithRegion("us-east-1"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have an envariable with the region.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Made it configurable (its tied to the location of the db)
esti/catalog_export_test.go
Outdated
require.Equal(t, http.StatusCreated, commitResp.StatusCode()) | ||
return commitResp.JSON201 | ||
} | ||
func genCSVData(cols []string, n int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func genCSVData(cols []string, n int) string { | |
func genCSVData(cols []string, n int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider using the standard encoding/csv package. Encoding CSVs like this is incredibly brittle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% done!
func testSymlinkS3Exporter(t *testing.T, ctx context.Context, repo string, tablePaths map[string]string, testData *exportHooksTestData) (string, string) { | ||
t.Helper() | ||
|
||
tableYaml, err := yaml.Marshal(&testData.TableSpec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just store testData.TableSpec
as an interface{}
, or possibly a map[string]interface{}
, and avoid having to write YAML in code, and then deserialize it back into a map?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No can do. Because that struct is not only used for YAML but also as configuration parameters during the test.
Therefore, .I would still have to maintain variables for all of those, why not just map 1 to 1 with the YAML anyway.
esti/catalog_export_test.go
Outdated
|
||
// upload action | ||
commit := uploadAndCommitObjects(t, ctx, repo, mainBranch, map[string]string{ | ||
"_lakefs_actions/animals_symlink.yaml": renderTplAsStr(t, testData, "action", symlinkExporterAction), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this upload need to be separate from the one on l.196?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't you're right consolidated everything into one upload, thanks!
bo := backoff.NewExponentialBackOff() | ||
bo.MaxInterval = 5 * time.Second | ||
bo.MaxElapsedTime = 30 * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that ExponentialBackoff
is an exported type, why not just
bo := &backoff.ExponentialBackoff{
MaxInterval: 5 * time.Second,
MaxElapsedTime: 30 * time.Second,
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it has other variables inside that New func that we want to use as defaults and are not set when calling backoff.ExponentialBackoff{}
esti/catalog_export_test.go
Outdated
}) | ||
require.NoErrorf(t, err, "listing symlink files in storage: %s", symlinksPrefix) | ||
if len(listResp.Contents) == 0 { | ||
return fmt.Errorf("no objects found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer Errors.new perhaps?
But I am not sure that returning this as a specific error gives more information than just returning the empty list -- which will also output the expected results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i want the retry to keep running if there are no objects found, that's why I'm returning an error.
Per the message itself changed to errors.new
// Symlinks export: lua script, table in _lakefs_tables, action file, mock table data in CSV form | ||
// Glue export: lua script, table in _lakefs_tables, action file | ||
func TestAWSCatalogExport(t *testing.T) { | ||
requireBlockstoreType(t, block.BlockstoreTypeS3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that this is the ideal thing to use. We are interested in running on AWS, not in running of S3. Suppose we were running on MinIO: presumably Glue would not work there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Thanks!
Important test! Thank you |
Closes #6686
Test flow
Symlink Exporter:
Glue Exporter - Relies on previous step and will create a glue table based on data in that commit: